Record: SLOT-48 — val_bpb 0.7406 (3-seed mean) by anthony-maio · Pull Request #1321 · openai/parameter-golf

anthony-maio · 2026-04-04T04:07:34Z

Summary

val_bpb: 0.7406 (3-seed mean, std 0.0051)
Artifact: 15.75-15.82 MB (all seeds < 16MB)
Training: 600s on 8xH100 SXM | Eval: ~409s (sliding + SLOT)

3-Seed Results

Seed	Sliding BPB	+ SLOT BPB	Artifact
1337	1.126	0.7450	15,815,983
42	1.121	0.7350	15,751,595
2024	1.122	0.7416	15,793,375
Mean	1.123	0.7406

Beats merged SOTA (1.1147) by 0.374 BPB. Beats best pending (#1229, 0.9300) by 0.190 BPB.

What Changed vs PR #1313 (0.8637)

One parameter: SLOT_STEPS increased from 24 to 48. Same model, same training, same architecture.

SLOT Scaling (same model, different step counts)

Steps	BPB	Delta
16 (PR #1303)	0.946	—
24 (PR #1313)	0.864	-0.082
48 (this PR)	0.741	-0.123

SLOT-48 Details

Per-sample hidden delta [bsz, 1, 512] + logit bias [bsz, 1, 1024]
Scored-position masking (last stride=96 tokens per non-first window)
48 AdamW steps, cosine LR 0.012 -> 0.001, weight_decay=1e-8
Model weights frozen, delta optimized through detached hidden states
Eval: ~409s (under 10-min eval budget)

Compliance

Frozen-model SLOT: model weights never modified during evaluation. Only per-window throwaway delta and logit_bias optimized then discarded. Same pattern as accepted PRs Record: QK-Gain 4.0 + XSA-11 + Muon-TTT + SLOT — val_bpb 1.0914 (3-seed mean) #1176, Record: Scored-Position SLOT + Per-Sample Delta + GPTQ (val_bpb: 0.9300) #1229.
No n-gram cache, no eval-time GPTQ
Self-contained, no network calls
All seeds within time and size budgets

Reproduction

torchrun --standalone --nproc_per_node=8 train_gpt.py

Training: ~600s. Eval: ~409s. Total: ~17 min.

Credits

Base: PR Record: 11L LeakyReLU² + VRL + lzma — val_bpb 1.1229 (3-seed mean) #175, PR Record: SLOT + QK-Gain 4.0 + XSA-11 — val_bpb 0.9462 (3-seed mean) #1303, PR Record: SLOT-24 Aggressive — val_bpb 0.8637 (3-seed mean) #1313 (@anthony-maio)
SLOT: Hu et al. arXiv:2505.12392v2, PR Record: QK-Gain 4.0 + XSA-11 + Muon-TTT + SLOT — val_bpb 1.0914 (3-seed mean) #1176 (@bigbag), PR Record: Scored-Position SLOT + Per-Sample Delta + GPTQ (val_bpb: 0.9300) #1229 (@resouer)
QK-Gain 4.0: PR Non-record: XSA-All + QK Gain 4.0 + LN Scale — 45 Experiments on 1×RTX 5090 #1125
XSA: PR Record: QK-Gain 4.0 + XSA-11 + Muon-TTT + SLOT — val_bpb 1.0914 (3-seed mean) #1176 (@bigbag)
VRL: ResFormer (arXiv:2410.17897)

3-seed: 1337=0.7450, 42=0.7350, 2024=0.7416. All under 16MB. Same model as openai#1313, only SLOT_STEPS increased 24->48. Eval time 409s, within 10-min budget.

Copilot

Pull request overview

Adds a new 10min/16mb record entry for SLOT-48 evaluation-time tuning, reporting a 3-seed mean val_bpb of 0.7406 with artifacts under 16MB.

Changes:

Introduces a new record folder with the training/eval script (train_gpt.py) configured for SLOT_STEPS=48 by default.
Adds per-seed training logs and a submission.json summarizing 3-seed results/metadata.
Adds a README documenting results, deltas vs prior SLOT-24, and reproduction instructions.

Reviewed changes

Copilot reviewed 3 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/train_gpt.py	Training + eval script for the SLOT-48 record run (incl. SLOT eval path).
records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/train_seed42.log	Seed 42 training/eval log used as evidence for reported metrics.
records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/train_seed2024.log	Seed 2024 training/eval log used as evidence for reported metrics.
records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/train_seed1337.log	Seed 1337 training/eval log used as evidence for reported metrics.
records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/submission.json	Machine-readable result summary for the record submission.
records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/README.md	Human-readable summary of results, changes vs prior PRs, and reproduction steps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-04T04:11:07Z

records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/submission.json

+        "1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6034, "artifact_bytes": 15815983},
+        "42":   {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6563, "artifact_bytes": 15751595},
+        "2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6568, "artifact_bytes": 15793375}


The steps values in seed_results don’t match the actual stop steps shown in the corresponding train_seed*.log files (e.g., seed 42 stops at step 6576, seed 2024 at 6588, seed 1337 at 6578). Please update the JSON to reflect the logged training steps (or clarify what steps represents if it’s intentionally different).

Suggested change

"1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6034, "artifact_bytes": 15815983},

"42": {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6563, "artifact_bytes": 15751595},

"2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6568, "artifact_bytes": 15793375}

"1337": {"val_loss": 1.25793247, "val_bpb": 0.74502015, "steps": 6578, "artifact_bytes": 15815983},

"42": {"val_loss": 1.24104846, "val_bpb": 0.73502047, "steps": 6576, "artifact_bytes": 15751595},

"2024": {"val_loss": 1.25222813, "val_bpb": 0.74164171, "steps": 6588, "artifact_bytes": 15793375}

Copilot · 2026-04-04T04:11:07Z

records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/README.md

+| 1337 | 1.126 | **0.7450** | 6034 | 15,815,983 |
+| 42 | 1.121 | **0.7350** | 6563 | 15,751,595 |
+| 2024 | 1.122 | **0.7416** | 6568 | 15,793,375 |


The README’s “Steps” column doesn’t match the actual training stop steps in the included logs (e.g., seed 42 stops at 6576 in train_seed42.log, seed 2024 at 6588, seed 1337 at 6578). Please update the table so the reported step counts are consistent with the logs.

Suggested change

| 1337 | 1.126 | **0.7450** | 6034 | 15,815,983 |

| 42 | 1.121 | **0.7350** | 6563 | 15,751,595 |

| 2024 | 1.122 | **0.7416** | 6568 | 15,793,375 |

| 1337 | 1.126 | **0.7450** | 6578 | 15,815,983 |

| 42 | 1.121 | **0.7350** | 6576 | 15,751,595 |

| 2024 | 1.122 | **0.7416** | 6588 | 15,793,375 |

Copilot · 2026-04-04T04:11:08Z

records/track_10min_16mb/2026-04-03_SLOT48_LR012_Stride96/train_gpt.py

+ num_layers_total = max(
+  (int(k.split(".")[1]) for k in state_dict if k.startswith("blocks.")),
+  default=0,
+ ) + 1
+


num_layers_total is computed here but never used, which makes the quantization path harder to read/maintain. Please remove it (or use it if it’s intended for validation/metadata).

Suggested change

num_layers_total = max(

(int(k.split(".")[1]) for k in state_dict if k.startswith("blocks.")),

default=0,

) + 1

anthony-maio added 2 commits April 3, 2026 21:57

SLOT-48 submission script (steps=48, lr=0.012, stride=96)

2dd3ee3

Record: SLOT-48 — val_bpb 0.7406 (3-seed mean)

ca21818

3-seed: 1337=0.7450, 42=0.7350, 2024=0.7416. All under 16MB. Same model as openai#1313, only SLOT_STEPS increased 24->48. Eval time 409s, within 10-min budget.

Copilot AI review requested due to automatic review settings April 4, 2026 04:07

Copilot started reviewing on behalf of anthony-maio April 4, 2026 04:08 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

yahya010 mentioned this pull request Apr 4, 2026

Record: — val_bpb 0.7271 (3-seed mean) SLOT-48 + VRL + QK-Gain 4.0 + XSA-11 #1324

Open

Fix step counts from logs, remove dead num_layers_total

92947c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SLOT-48 — val_bpb 0.7406 (3-seed mean)#1321

Record: SLOT-48 — val_bpb 0.7406 (3-seed mean)#1321
anthony-maio wants to merge 3 commits intoopenai:mainfrom
anthony-maio:submission/slot48-aggressive

anthony-maio commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anthony-maio commented Apr 4, 2026

Summary

3-Seed Results

What Changed vs PR #1313 (0.8637)

SLOT Scaling (same model, different step counts)

SLOT-48 Details

Compliance

Reproduction

Credits

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants